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Provide each TSCS of samples as a column (or row) of 
entries in an mxn matrix M 


92 


^ Apply Hanning filter to each row (or column) of entries 
in M (optional; applied to STFT only) 


93 




Form signal processing transform (SPT) for each row 
(or column) of filtered entries in M 


94 - 


Form a selected combination of real and imaginary 
components of filtered and transformed signal samples 
in each row (or column) of M 


95 - 


Combine columns (or rows) end-to-end to 
provide a spectrogram for each window 
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SUB-AUDIBLE SPEECH RECOGNITION 
BASED UPON ELECTROMYOGRAPHIC 
SIGNALS 

FIELD OF THE INVENTION 5 

This invention relates to analysis of electromyographic 
signals produced in a human body 

BACKGROUND OF THE INVENTION to 

Communications between two or more humans, or 
between a human and a machine, is traditionally dominated 
by visual and verbal information and alphanumeric input. 
Efforts to automate human-to -human or human-to -machine 15 
communication, such as commercial speech recognition have 
emphasized the audible aspects. A totally auditory commu- 
nication strategy places a number of constraints on the com- 
munication channels, including sensitivity to ambient noise, a 
requirement for proper formation and enunciation of words, 20 
and use of a shared language. The physical limitations of 
sound production and recognition also become problematic 
in unusual environments, such as those involving hazardous 
materials (HAZMATs), extra vehicular activity (EVA) space 
tasks, underwater operations and chemical/biological war- 25 
fare (CBW). Conventional auditory expression may be unde- 
sirable for private communication needed in many situations 
encountered daily, such as discrete or confidential telephone 
calls, offline or sotto voce comments during a teleconference 
call, certain military operations, and some human-to -machine 30 
commands and queries. Communication alternatives that are 
both private and not dependent upon production of audible 
signals are valuable. 

One proposed method for studying alternative means of 
communication is direct understanding of brain signals, 35 
which bypasses speech and its analysis altogether. J. R. Wol- 
paw et al, “Brain-computer interface technology: a review of 
the first international meeting,” I.E.E.E. Trans, on Rehabili- 
tation Engineering, vol. 8 (2000) 1 64-171 , recently published 
a review of the state of electroencephalograph (EEG) analy- 40 
sis. Several practical difficulties are encountered for near term 
application of pure EEG approaches, due to use in EEG of 
aggregated surface measured brain potential. Additionally, 
one confronts the nonlinear complexity and idiosyncratic 
nature of the signals. An alternative, invasive EEG measure- 45 
ment and analysis, is not considered practical for widespread 
use, except for extreme medical conditions. 

What is needed is a sub-audible communication system 
that provides one or more tiers, in addition to conventional 
audible communication, to exchange or transfer information 50 
compactly, reliably and reasonably accurately. Preferably, the 
amount of computation required should be modest and not be 
out of proportion to the information obtained through the 
signal processing. 

55 

SUMMARY OF THE INVENTION 

These needs are met by the invention, which provides a 
system for receipt and analysis of sub -audible signals to esti- 
mate and provide a characterization of speech that is sotto 60 
voce or is not fully formed for purposes of normal speech 
recognition. This system relies on surface measurement of 
muscle signals (i.e., electromyographic or EMG signals) to 
discriminate or disambiguate speech signals produced with 
relatively little acoustic input. In one alternative, EMG sig- 65 
nals are measured on the side of a subject’s throat, near the 
larynx, and under the chin near the tongue, to pick up and 
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analyze surface signals generated by a tongue (so-called elec- 
tropalatogram or throat EPG signals). This approach relies on 
the fact that audible speech muscle control signals must be 
highly repeatable, in order to be understood by others. These 
audible and sub-audible signals are intercepted and analyzed 
before sound is generated using these signals. The processed 
signals are fed into a neural network pattern classifier, and 
near-silent or sub-audible speech that occurs when a person 
“talks to himself or to herself’ is processed. In this alternative, 
the tongue and throat muscles still respond, at a lowered 
intensity level, as if a word or phrase (referred to collectively 
herein as a “word”) is to be made audible, with little or no 
external movement cues present. Where sufficiently precise 
sensing, optimal feature selection and good signal processing 
are available, it is possible to analyze these weak signals to 
perform, or direct performance of, useful tasks without con- 
ventional vocalization, thus mimicking an idealized thought- 
based approach. 

In a training phase, the beginning and end of a sub-audible 
speech pattern (“SASP”) is first determined for each of R 
spoken instances of a word in a database including Q words in 
a window of temporal length 1 -4 sec (preferably about 2 sec), 
are provided and processed. A Signal Processing Transform 
(SPT) operation (characterized in the text) is performed on 
each data sample. The resulting transforms, evaluated at a 
selected sequence of transform parameter values, become 
entries in a matrix M, where a first matrix axis may represent 
different scale factors and a second matrix axis may represent 
a time associated with a window. The matrix M is tessellated 
into groups of cells (e.g., of rectangular shape), and each cell 
is represented by a feature for that cell. The cell features are 
rearranged as a vector, having K entries v^(q;r). For each word 
(q) in the database and for each spoken instance (r) of a 
(known) word, a sum SI (q;r) A is formed of cell representative 
number k multiplied by a first set of weight coefficients 
w i^(q,r) and summed over k=l, . . . , K, and a first activation 
function (or functions) Al {SI (q;r) A } of the first sum S 1 . The 
first activation value is multiplied by a second set of weight 
coefficients w 2>A (q;r) and summed over h=l , . . . , H to form 
a second sum S2(q;r)^, and a second activation function (or 
functions) A2{S2(q;r)^ } is computed. The weight coeffi- 
cients w x s ^(q;r) and w 2>A (q;r) are adjusted, using a neural 
net learning procedure, to provide at least one reference acti- 
vation function value A(q;ref)^ for which a difference, Al (q; 
r)^=IA(q;ref)^-A2{S2(q;r)^,}l, is no greater than a first error 
threshold €(thr;l). for all spoken instances, r=l, . . . , R and all 
value of the weight index g, with corresponding sets of weight 
coefficients {w 1ArA (q:ref)} and {w 2/ ^(q;ref)}. This com- 
pletes the training phase of the invention. 

An SASP including an unknown word is provided and 
sampled in a sequence of windows, as in the training phase. 
Signal Processing Transforms are computed for the sample 
values in each of these windows, a matrix M' is formed and 
tessellated into cells, and a representative value for each cell 
is optionally normalized, as before. The representative cell 
values are formatted as a vector, with entries v k . Using the 
already-determined weights w^(q;ref) for each word (q), a 
sum S'(q;ref) of the vector entries v k , multiplied by the cor- 
responding weights v^(q;ref) is formed, an activation function 
value A{S'(q;ref)} is computed, and differences A2(q)^=IA 
(q;ref)^-A2{S2(q;r;j)^,} I are computed and compared with a 
second error threshold €(thr;2). If at least one word (q=q0) 
can be found for which A2(q0)^€(thr;2), at least one word 
(q=q0) with minimum value A2(q0) is interpreted as corre- 
sponding to the unknown word. 

The first phase of the technique is a learning procedure, 
whereby the system learns to distinguish between different 
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known words in a database and provides reference sets of 
neural net weight coefficients for this purpose. In a second, 
word recognition phase, the weight coefficients are applied to 
one or more unknown words to determine if an unknown 
word is sufficiently similar to a word in the database. This 5 
technique provides several advantages, including minimiza- 
tion of word variations through use of a shared language and 
shared sound production, a potential to connect the sub-au- 
dible signal recognition to a flexible, highly developed speech 
recognition architecture non-invasive sensing, reasonably 10 
robust response to physiological variations, and privacy. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 illustrates placement of signal recording electrodes 15 
in an initial experiment on sub-audible speech analysis. 

FIGS. 2A-2F are graphical views of sub-audible signals 
corresponding to the generic words “stop”, “go”, “left”, 
“right”, “alpha” and “omega.” 

FIG. 3 illustrates a simplified neural network classifier, 20 
with one hidden layer, that may be applied in practicing the 
invention. 

FIGS. 4 and 5 are high level flow charts of procedures for 
practicing a training procedure and a word recognition pro- 
cedure according to the invention. 25 

FIGS. 6-10 are flow charts of intermediate procedures 
associated with the steps in the FIG. 4 or FIG. 5 flow chart. 

DESCRIPTION OF BEST MODES AND THE 

INVENTION 30 

In some initial tests, sub-audible pronunciation of six 
English words (“stop”, “go”, “left”, “right”, “alpha” and 
“omega”) and ten numerals (“1”, “2”, “3”, “4”, “5”, “6”, “7”, 
“8”, “9” and “0”) were recorded for each of three subjects, 35 
ages 55, 35 and 24, to provide a control set of words for a 
small graphic model that might be used to provide commands 
on a Mars Rover system, for example. The words “alpha” and 
“omega” may be used to enter a command to move faster or 
slower, or up or down, or forward or backward, as appropriate 40 
under the circumstances. EMG data were collected for each 
subject, using two pairs of self-adhesive AG/AG-C1 elec- 
trodes, located near the left and right anterior external area of 
the throat, about 0.25 cm back from the chin cleft and about 
1 .5 cm from the right and left sides of the larynx, as indicated 45 
in FIG. 1 . Initial results indicate that one pair, or more pairs if 
desired, of electrodes, located diagonally between the cleft of 
the chin and the larynx in a non- symmetrical relationship, 
will suffice for recognition in small word sets. Signal ground- 
ing usually relies on attachment of an additional electrode to 50 
the right or left wrist or another location on the body. When 
data are acquired using wet electrodes, each electrode pair is 
connected to a commercial Neuroscan or equivalent signal 
amplifier and recorder that records the EMG responses at a 
sampling rate of up to 20 KHz. A 60 Hz notch filter is used to 55 
reduce ambient signal interference. 

One hundred or more exemplars for each word were ini- 
tially recorded for each subject over a six-day interval, in 
morning and afternoon sessions. In a first group of experi- 
ments, the signals were sectioned offline into two-second 60 
time windows with variable window start times, and extrane- 
ous signals (coughs, swallows, body noises, etc.) were 
removed using SCAN 4 Neuroscan software. FIGS. 2A, 2B, 

2C, 2D, 2E and 2F graphically illustrate representative EMG 
blocked signals for six windows, corresponding to the words 65 
“stop”, “go”, “left”, “right”, “alpha” and “omega”, respec- 
tively. The blocked signals for these words are not wholly 
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reproducible and may be affected by the test subject’s health 
and the time (of day) the particular signal is recorded and 
analyzed. The technique must also take into account the 
changing signal-noise ratio and/or changing amplitudes of 
the signals. 

For signal feature processing, Matlab scripts were devel- 
oped to provide a uniform signal processing system from 
recording through network training. These routines were used 
to receive and transform the raw signals into feature sets, to 
dynamically apply a threshold to the transformed signals, to 
compensate for changes in electrode locations, to adjust sig- 
nal-noise ratios, and to implement neural network algorithms 
for pattern recognition and training. EMG artifacts, such as 
swallowing, muscle fatigue tremors and coughs, were 
removed during preprocessing of the windowed samples. In a 
real time application, artifact filters would be incorporated 
and applied to introduction of new words into the lexicon. 

Sectioned signal data for each word were transformed into 
usable classifier feature vectors using preprocessing trans- 
forms, combined with a coefficient reduction technique. Sev- 
eral transforms were tested, including: (i) a short time interval 
Fourier Transform (STFT), requiring multiple overlapping 
windows; (ii) discrete wavelets (DWTs) and continuous 
wavelets (CWTs) using Daubechies 5 and 7 bases; (iii) dual 
tree wavelets (DTWTs) with a near_sym_a 5,7 tap filter and a 
Q-shift 14,14 tap filter; (iv) Hartley Transforms; (v) Linear 
Predictive Coding (LPC) coefficients and (vi) uniformly and 
nonuniformly weighted moving averages. Feature sets were 
created differently for each of these transform approaches, 
depending upon the unique signal processing approaches, 
with different pattern discriminations. 

The most effective real time SPTs were the windowed 
STFTs and the DTWT coefficient matrices, each of which 
was post-processed to provide associated feature vectors. 
One suitable procedure is the following. Transform coeffi- 
cient vectors are generated for each word, using, for example, 
the STFT or the DWT applied to the magnitude (absolute 
value) of the raw signal amplitude. Where unipolar, rather 
than bipolar, electrodes are used, positive and negative sign 
signals are distinguishable, and STFTs and DWTs could be 
applied to the raw signal amplitudes without automatic for- 
mation of an absolute value. Vectors were post processed 
using a Matlab routine to create a matrix of spectral coeffi- 
cients. This matrix is tessellated into a set of sub-matrices or 
cells, depending upon the spectral information complexity. 
Tessellation sizes were determined in part by average signal 
energy in a given region of the spectral matrix. Uses of equal 
and unequal segmentation sizes were considered. A represen- 
tative value was calculated for each candidate sub -matrix, to 
reduce the number of features or variables presented to the 
pattern recognition algorithm and to represent average coef- 
ficient energy. 

A simple mean or average signal energy within a cell was 
used as a cell representative or “feature.” Other first order 
statistical values, such as medians, modes and maximum 
sub-matrix values, can be used but appear to provide no 
substantial improvement over use of a simple mean of signal 
energy. The result of this approach is a fixed length feature 
vector for each sub-audible word tested. Dual tree wavelets 
are attractive here, as opposed to standard discrete wavelets, 
to minimize the normal wavelet sensitivity to phase shifts. 
Continuous wavelets (CWTs) are not presently practical for 
real time computations. The Hartley Transform, which pro- 
vides additional information on signal behavior along a non- 
real line in the transform plane, was also explored, as was use 
of moving averages of various lengths. 
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Feature vectors for each instance of a word are used to train 
a neural network (NN) word recognition engine. Accuracy of 
recognition is evaluated using about 20 percent of the 
untrained word exemplars and signals from only one elec- 
trode pair, which is randomly drawn from the collection of 5 
electrode pairs, in a data recording session. 

Five NN paradigms were considered for signal training 
classification, using the entire feature set: (1) scaled conju- 
gate gradient nets; (2) Leavenberg-Marquardt nets; (3) proba- 10 
bilistic neural nets (PNNs); (4) modified dynamic cell struc- 
ture (DCS) nets; and (5) linear classifiers. After comparison 
of the results, a scaled conjugate gradient net was chosen, for 
the following reasons. A Leavenberg-Marquardt net reaches 
the lowest mean square error level but requires too much 15 
system memory when dealing with large data sets, even 
where reduced memory variations are used. A signal having a 
low mean square error (MSE) does not necessarily corre- 
spond to, or produce, an improved generalization for new 2 q 
signals, where high sensor noise is present. PNN nets provide 
reasonable classifications but require very large training 
sample sizes to reach stable probabilities and do not appear to 
be superior in ultimate pattern discrimination ability. A 
dynamic cell structure (DCS) net provides fast net training, 25 
which is attractive for real time adaptation, but is less compact 
for the anticipated applications that are memory sensitive. A 
scaled conjugate gradient network has fast convergence with 
adequate error levels for the signal-noise ratios encountered 3Q 
in the data; and the performance is comparable to the Leav- 
enberg-Marquardt performance. The scaled conjugate gradi- 
ent network uses a “trust” region gradient search criterion, 
which may contribute to the superior overall results of this 
approach. 35 

In other EMG tasks, we successfully applied Hidden 
Markov Models (HMMs), but these appear to be most effec- 
tive for non-multi-modal signal distributions, such as are 
associated with single discrete gestures, rather than with the ^ 
temporally non- stationary, sub-audible signal patterns of 
concern here. An HMM approach also requires sensitive pre- 
training to accurately estimate transition probabilities. A 
hybrid HMM/neural net approach, is an alternative. 

In order to quickly explore many experimental situations 45 
using different transform variations, we have operated in a 
simulated real time environment that has been developed and 
used at N. A. S .A. Ames, wherein EMG signals are recorded to 
file and are later used to train and test the signal recognition 
engines. Our initial three test subjects were not given imme- 50 
diate feedback about how well their sub-audible signals were 
recognized. However, some learning occurred as each test 
subject was permitted to view the subject’s EMG signals. 

FIG. 3 illustrates a simplified example of a neural network 
classifier 31 with one hidden layer, configured to analyze a 55 
vector of feature values provided according to the invention. 
The NN configuration 31 includes a first (input) layer 32 
having four input nodes, numbered k=l , . . . , K (K=4 here), a 
second (hidden) layer 33 having two intermediate nodes, 6Q 
numbered h=l, . . . , H (H=2 here), and a third (output) layer 
34 having three output nodes, numbered g=l, . . . , G (G=3 
here). A practical neural net classifier may have tens or hun- 
dreds of input nodes, hidden layer(s) nodes and output nodes. 
The input values v k received at the first layer of nodes are 65 
summed and a first activation function A1 is applied to pro- 
duce 


Uh = Al 
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where the quantities w ljAr h are weight coefficients connecting 
the nodes in the first layer to the nodes in the second layer and 
b is a bias number. The intermediate values received at the 
second layer are summed and a second activation function A2 
is applied to produce 
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where the quantities w 2hg are weight coefficients connecting 
the nodes in the second layer to the nodes in the third layer. 
Here, Al and A2 may be, but need not be, the same activation 
function, and more than one activation function can be used in 
a given layer. More than one hidden layer can be included, by 
obvious extensions of the notation. This formalism will be 
used in the following development of the NN analysis, in 
FIGS. 4 and 5. 

Training Procedure. 

The term “training procedure,” according to one embodi- 
ment of the invention, includes the following actions: (1) 
receive R spoken instances, of a sub-audible EMG signal, for 
at least one known word; (2) detect the beginning of each 
SASP containing an instance, using a thresholding proce- 
dure; (3) for each SASP, create a window, having a selected 
length At(win), that includes the SASP; (4) apply a “signal 
processing transform” (SPT) to each instance of one of the 
SASPs; (5) form a matrix (which can be one-dimensional, a 
vector) from the SPT values for each instance of the SASP; 
(6) tessellate the matrix into cells, with each cell represented 
by a cell “feature”, for each instance; (7) (re)format the cell 
features as entries or components of a vector; (8) (optionally) 
normalize the vector entries; (9) receive the vector entries for 
each instance of the SASP in a neural network classifier; (10) 
for all instances of each word, identify sets of reference 
weight coefficients for the vector entry values that provide a 
best match to a reference pattern that corresponds to the 
words considered; and (11) use the reference weight coeffi- 
cients in a neural network analysis of an unknown word 
received by the system. 

FIG. 4 is a high level flow chart illustrating a procedure for 
practicing a training procedure according to the invention. In 
step 41, a sequence of length At(win)=l-4 sec (preferably, 
At(win)«2 sec) of sampled signal values is received, and a 
sample thresholding operation is performed to determine 
where, in the sequence, a sub -audible speech pattern (SASP) 
begins. SASPs representing samples of a known word at a 
selected rate (e.g., about 2 KHz) are identified, recorded and 
optionally rectified. Signal rectification replaces the signal at 
each sampling point by the signal magnitude (optional). R 
spoken instances, numbered r=l, . . . , R (R^ 10), of a given 
word (SASP) are preferably used for training the system to 
recognize that SASP. 

In step 42, a Signal Processing Transform operation is 
performed on the pattern SASP over the window length 
At(win) for each spoken instance r=l, . . . , R, and for each 
word, numbered q=l, . . . , Q in a database, to provide a 
spectrum for the received signal for each of the windowed 
samples. As used herein, a “Signal Processing Transform” 
(SPT) has a finite domain (compact support) in the time 
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variable, provides a transform dependent upon at least one 
transform parameter (e.g., window length, number of 
samples used in forming the transform, scale factor, fre- 
quency, etc.), allows summation or integration over this 
parameter, and a collection of these transforms for different 5 
values of the transform parameter is mathematically com- 
plete. 

The SPT operation in step 42 may rely upon a short time 
interval Fourier transforms (STFTs), discrete wavelets 
(DWTs) and continuous wavelets (CWTs) using Daubechies 10 
5 and 7 bases; dual tree wavelets (DTWTs) with a 
near_sym_a 5,7 tap filter and a Q- shift 14,14 tap filter; Hart- 
ley Transforms; Linear Predictive Coding (LPC) coefficients, 
and uniformly and nonuniformly weighted moving averages, ^ 5 
or any other suitable transforms. The spectrum obtained by 
this operation (expressed as a function of one or more trans- 
form parameters) is a sequence of data transform samples, 
formatted as an m-row-by-n-column matrix M (or as a vector, 
with m=l or n=l ) having a first matrix axis (along a row) and 20 
a second matrix axis (along a column), with each matrix entry 
representing a concentration or intensity associated with a 
scale factor and/or window time. In a preferred embodiment, 
for a wavelet SPT, the n columns (e.g., n=30) represent an 
increasing sequence of window times for constant scale fac- 25 
tor, and the m rows (e.g., m=l 29) represent a dyadic sequence 
of scale factors used to provide the spectrum for a given 
window time. Alternatively, the m rows may represent win- 
dow times and the n columns may represent scale factors. A 
sequence of further operations is performed on the matrix, as 30 
discussed in the following, 

In step 43, the matrix entries (e.g., wavelet coefficients) are 
tessellated or decomposed into “cells,” with each cell repre- 
senting a grouping of adjacent matrix entries (e.g., a rectan- 
gular grouping of one or more sizes), where the entries in a 35 
given cell resemble each other according to one or more 
criteria and associated metric(s). A matrix may be divided 
into uniform size cells or may be divided according to statis- 
tical similarity of cell entries or according to another crite- 

40 

non. 

As an example, consider the following 4x6 matrix 


1 2 3 4 5 6 

1 3 5 7 9 11 

2 6 12 20 15 8 

3 18 14 7 9 6 


(3) 

45 


Tessellation of the matrix entries into the four 2x3 non-over- 
lapping groups of entries in this example may depend, for 
example, upon the relative sizes of the entries in the matrix M. 
More generally, each cell is represented by a “feature” asso- 
ciated therewith, which may be one or more associated 
numerical coefficient values, such as the entries in the matrix 
<M> shown in Eq. (4), or a maximum or minimum value from 
the cell entries. 

In step 44, each cell representative value or feature in the 
tessellated matrix is optionally normalized by dividing this 
value by (i) a sum of all values of the cell representatives, (ii) 
a sum of the magnitudes of all values of the cell representa- 
tives, (iii) the largest magnitude of the cell representative 
values or (iv) another selected sum. Alternatively, a normal- 
ized cell representative value is formed as a difference 
between the cell representative value and a mean value for 
that population, divided by a standard deviation value for that 
population. Alternatively, a normalized cell representative 
value is formed as a difference between the cell representative 
value and a maximum or minimum cell representative value 
for the tessellated matrix. One goal of normalization is to 
reduce the dynamic range of the cell representative values for 
each instance r=l, . . . , R and each word q=l, . . . , Q. 

In step 45, the (normalized) cell representative values 
determined in step 34 are arranged as a vector of length 
K=number of cells) or other suitable entity for subsequent 
processing. 

In step 46, the vector entries v^(q;r) are received and pro- 
cessed by a neural net (NN) classifier by multiplying each 
vector entry v t (q;r) by a first set of weight coefficients w ljAr A 
(q;r) (0^w x ^=1; k=l, . . . , K; h=l, . . . , H) and summing 
these weighted values to form 


SI (#; r) h = ^ r) ■ v k {q\ r) {h = 1, . . . , H) 

k 


(6) 


This process is repeated for each of the R spoken instances of 
the known word. Each of the weighted sums Sl(q;r) A 
becomes an argument in a first activation function A1 {SI (q; 
r)/,}, discussed in the following, also in step 46. Also in step 
46, a second set of sums is formed 


h 


(7) 


The matrix M may be expressed as a vector or single stream 50 
of data entries. If one decomposes this matrix M into four 2x3 
non-overlapping rectangular groups of entries (cells), the cor- 
responding arithmetic means of the four cells become 

55 


which becomes an argument in a second activation function 
A2{S2(q;r) g }. 

In step 47, the system provides a set of reference values 
{A(q;ref)^}^ for the word number q and computes a set of 
differences 


< M >= 


2.5 7 

9.17 10.83 


(4) 


Al(?) = (1 /«• C)y \A2{Sliq-, r \ ) - A(q-, ref)/ 
r,g 


( 8 ) 


which can represent each of the four cells, and the corre- 60 

sponding standard deviations of the four cells become where p is a selected positive number. The system determines, 

in step 48. if Al(q)^=€(thr,l), where €(thr;l) is a selected 
threshold error, preferably in a range 0.01 and below. 

< am >- 1 42-75 291 I ^ If the answer to the query in step 48 is “yes,” the system 

1 628, 97 1 17.36 1 65 accepts this estimated reference set, in step 49, for use in the 

word recognition procedure, illustrated in FIG. 5. If the 
answer to the query in step 48 is “no,” the first and second 
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weight coefficients, w 1Ar A (q;r) and w 2jA (q;r), are adjusted to 
provide another set of estimated reference values A(q;ref)^, in 
step 50, using a neural net analysis approach, and steps 46-48 
are repeated. In the neural net analysis, a gradient method is 
applied to a geometric space with coordinates w 1JM? (q;r) and 
w 2 ^(q;r), as discussed subsequently. 

In the procedure illustrated in FIG. 4, two suitable activa- 
tion ffinctions are 

^{5 , }=tanh(5)={exp(a -5)-exp(-fl -5) }/{exp (a -5)+ 


exp(-a-iS)}, 

(9A) 

A {5 , }=1 /{ 1 - exp {-a ■ S ) } , 

(9B) 


having the respective ranges of [- 1 , 1 ] and [0, 1 ] for -oo<S<oo 3 
where a is a selected positive number. Other monotonically 
increasing, finite range functions can also be used as activa- 
tion functions. 

For each word q, each reference value A(q;ref)^ (q=l , . . . , 
Q) may be determined by different first reference sets of 
weight coefficients, {w x ^(qjref)}* and/or by different sec- 
ond reference sets of weight coefficients {w 2 ^(q;ref)} fc , 
which are now fixed for the word number q. The reference 
values A(q;ref)^ and the associated first and second reference 
sets of weight coefficients will henceforth be used for com- 
parison with not-yet-identified SASP words. Optionally, the 
NN has F hidden layers and F+l sets of weight coefficients 

(FSl). 

In an alternative embodiment, in steps 46-50, a first uni- 
versal set of weight coefficients, {w x ^(ref)}^ and a second 
universal set of weight coefficients {w 2 ^(ref)} A , not depen- 
dent upon the particular word (q), replace the first and second 
sets of weight coefficients {w 1A:5A (q;ref)}£ and {w 2>A 
(q,rei)} /r In this alternative embodiment, where the database 
includes at least two words, the order of the instances of 
different (transformed) words must be randomized, and the 
neural network classifier seeks to identify first and second 
universal sets of weight coefficients, {w^^ref)}*. and 
{ w 2 ,h 5<? ( re f)}/z, that are accurate for all words in the database. 

Word Recognition Procedure. 

The word recognition procedure, according to one embodi- 
ment of the invention, includes the following actions: (1) 
receive a sub-audible EMG signal, representing an unknown 
word; (2) detect the beginning of an SASP, using a threshold- 
ing procedure; (3) create a window, having a selected length 
At(win), that includes the SASP; (4) create a sequence of 
time-shifted windowed versions of the received SASP, with 
time shifts equal to a multiple of a time displacement value 
At(displ); (5) apply a signal processing transform (SPT) to 
each of the time- shifted versions of the SASP, (6) form a 
matrix (which can be one-dimensional, a vector) from the 
SPT values for each of the time-shifted versions of the SASP; 
(7) tessellate the matrix into cells, with each cell represented 
by a cell “feature”; (8) (re)format the cell features as entries or 
components of a vector; (9) (optionally) normalize the vector 
entries; (10) receive the vector entries, for each time-shifted 
version of the SASP in a trained neural network classifier, and 
identify a word from a database that provides a best match to 
an activation function value corresponding to each time- 
shifted version of the SASP, (11) accumulate a point for each 
best match; and (12) identify a word, if any, with the highest 
point count as the best match to a word corresponding to the 
received SASP. 

FIG. 5 is a high level flow chart of a word recognition 
procedure that uses the results of the training procedure 
shown in FIG. 4. In step 51, A sub-audible signal pattern 
(SASP) representing a sample of a “new” (unknown) word 
(referred to as number q’) is received and optionally rectified. 
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A sequence of sample values is received at the selected rate 
used in FIG. 4. A sample thresholding operation is performed 
to determine where, in the sequence, the sub-audible speech 
pattern (SASP) begins. A sequence of J time-shifted, partially 
5 overlapping windows, numbered j=l , . . . , J (J=2), is formed 
from the signal representing the new word, with consecutive 
start times displaced by multiples of a selected displacement 
time such as At(displ)=0-At(win)/2. 

In step 52, an SPT operation is performed on the new SASP 
to over the window length At (win), to provide a first spectrum 
for the new word for each of the windowed samples. In step 
53, the matrix entries are tessellated or decomposed into the 
same cells that were used for each word in step 43 of FIG. 4. 
In step 64, each cell representative value or feature in the 
1 5 tessellated matrix is optionally normalized. In step 55, the cell 
representative values are arranged as a vector V' having vector 
entries v* (k=l , . . . , K) or other suitable entity for subsequent 
processing. In step 56, the first and second reference sets of 
weight coefficients, {w 1;i ^(q;ref)} t and {w 2 ^(q;ref)U (or 
20 {w w (ref)}* and {w 2tA>ff (ref)} A ) used to compute the activa- 
tion function reference value A2{S2(q;reI)}^ (or A{S(ref)}^) 
for the word number q are used to compute an activation 
function A2{S2'(q';ref}^, as in Eq. (6). 

In step 57, the system computes differences 

A2 (q,q';j) g = \A2{S2Xq';j;ref)} g -A{q;ref) g \ (10) 

for each word (q) in the database, for each time shifted win- 
dow (j) and for each NN third layer index g. Optionally, in 
step 58, only those words (q) in a “reduced” database RDB, 
30 for which 

A2 (q; q 'y) y =e(thr; 2) 

is satisfied, are considered in accumulating points in step 49, 
where €(thr;2) is a selected second threshold error, preferably 
35 in a range 0.01 and below. Optionally €(thr;l)=€(thr;2), but 
this is not required. 

In step 59, for each time-shifted window (numbered 
j=l, . . . , J), each word (q) in the database (or in the reduced 
database RDB) that provides the smallest value A2(q;q';j) ? , 
40 among the set of values computed in Eq. (11), is given one 
point or vote. In step 60, the word (q) in the database with the 
largest number of points is interpreted as the unknown word 
(q 1 ) that was received. Optionally, the point(s) accumulated 
according to the minimum value of A2(q;q , ;j) g . can be 
45 weighted, for example, by multiplying the number 1 by a 
weight function WF{A2(q;q';j)^} that is monotonically 
decreasing as the argument A2(q;q , ;j) ? increases. Two 
examples of suitable weighting functions are 


50 

WF(s)=a+b -exp [-as}. 

(12A) 



(12B) 


where a, b, c, a and p and the product d-e are non-negative 
numbers, not all 0. If two or more words (e.g., ql and q2) in 
55 the database have substantially the same largest point accu- 
mulation, the system optionally interprets this condition as 
indicating that no clearly-best match to the unknown word 
(q') is available, in step 61. 

FIG. 6 sets forth in more detail a first embodiment for a 
60 thresholding operation for step 41 in FIG. 4 . In step 71 , two or 
more moving averages of consecutive sequences of HI 
sampled values and H2 sampled values are formed (H1<H2), 
where, for example, HI =10 and H2=20 is a suitable choice. 
Initially, the sample amplitudes and both moving averages are 
65 substantially 0, except for the presence of noise. As the sys- 
tem encounters the beginning of a sub -audible speech pattern 
(SASP), the shorter HI -sample will rise before the longer 
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H2 -sample rises, when applied to consecutive sample runs 
with the same starting point. In step 72, the system determines 
if the moving average of the HI -samples is at least a multiple 
jli of the moving average of the H2-samples, where g is a 
selected ratio ^ 1 . If the answer to the query in step 62 is “no,” 
the system returns to step 71 and continues to receive samples 
and to form the two moving averages. If the answer to the 
query in step 72 is “yes,” the system infers that an SASP is 
present and that an “SASP threshold” has been crossed; and 
the system begins to divide succeeding time intervals into 
epochs, in step 73. Other methods of determining when an 
SASP threshold has been crossed can also be used here. 

In step 74 of FIG. 6, a set of signal samples is received, 
preferably as a stream of data, and the magnitude or absolute 
value of each SASP signal sample is formed (optional). In 
step 75, a consecutive sequence CS of the signal samples is 
formed within an event window, preferably of length 
At(win)=l-4 sec. In step 76 the system creates anew sequence 
TSCS of time shifted consecutive sequences, with the begin- 
ning of each TSCS being shifted by a selected time delay 
amount At(displ) relative to the immediately preceding 
TSCS. Each TSCS will be processed and classified by a 
neural network classifier. The number of (above-threshold, 
consecutive) TSCSs may be used as a parameter in the com- 
parisons in FIG. 4. The system then proceeds to step 42 of 
FIG. 4 and continues. 

FIG. 7 illustrates a dynamic threshold adjustment proce- 
dure, relying in one implementation on a Bollinger band, that 
may be used in step 41 . In step 81, a sequence of T amplitudes 
“a” of the signal are received and stored. In step 82, a mean (p) 
and standard deviation (a) are computed for the stored 
sequence. In step 83, the system determines if the magnitude 
of the difference lu-pl is at least equal to L-a for at least one 
amplitude u in the stored sequence, where L is a selected 
positive number (e.g., L=4-10). If the answer to the query in 
step 83 is “no”, the system replaces the stored sequence by a 
new sequence (e.g., shifted by one sample value), in step 84, 
and returns to step 82; no threshold has yet been crossed in 
this situation. If the answer to the query in step 83 is “yes”, a 
threshold has been crossed within the stored sequence and a 
position representing the beginning of the word can be iden- 
tified, in step 85. 

FIG. 8 is a flow chart providing more detail on step 42 in 
FIG. 4, where a Fourier transform is used for the SPT opera- 
tion In step 91, the data stream is optionally reformatted into 
a sequence of columns (or into rows) of signal samples, with 
each column (or row) corresponding to a TSCS, according to 
the format required for computer analysis. In step 92, a Han- 
ning filter is optionally applied to each STFT window. In step 
93, an SPT operation is performed for each row of (filtered) 
signal samples. The particular SPT used may be a conven- 
tional Fourier transform (applied to a window of finite width), 
a dual wave tree wavelet transform, a Daubechie transform, a 
Hartley transform, a moving average with uniform or non- 
uniform weights, or similar transforms. The particular choice 
will depend upon the known characteristics of the data 
received for analysis. Preferably, the SPT of the signal sample 
sequences will provide real and imaginary components that 
can be combined and processed as appropriate. In step 94, the 
system forms a selected combination of real and imaginary 
components of the (filtered and transformed) signal samples 
in each row. In step 85, the columns (or rows) are combined, 
end-to-end, to provide a spectrogram for each (time-over- 
lapped) window. 

FIG. 9 is a flow chart providing more detail on step 43 in 
FIG. 4, according to a first embodiment for tessellation of the 
matrix M. In step 101, the entries within the matrix M are 
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decomposed into non-overlapping, rectangularly-shaped 
cells of one or more selected sizes (e.g., 2x3 or 5x5 or 10x7) 
so that every entry belongs to precisely one cell. Cells adja- 
cent to a boundary of the matrix M may have a different 
5 (residual) size. In step 102, a first order statistical coefficient 
nq (e.g., arithmetic mean, median, mode or largest value) is 
computed for, and associated with, each cell, representing an 
average magnitude or other feature for the entries within the 
cell. A second order statistical coefficient m 9 (e.g., standard 

10 

deviation) is optionally computed for each cell. Here, the 
individual values within each cell may be substantially dif- 
ferent so that the first order coefficient nq associated with a 
given cell may not be very representative of the individual 
15 entries. However, the cells in this embodiment are of fixed 
size, which is useful in some of the following computations. 
At one extreme, each cell may be a single entry in the matrix 
M. 

FIG. 10 is a flow chart of an alternative embodiment for 
20 tessellation of the matrix M (step 43 in FIG. 4). In step 111, 
the matrix entries are tentatively aggregated into “pre-cells,” 
with each pre-cell initially being a single entry and having a 
second order statistical coefficient m 2 of 0. Consider a general 
pre-cell, such as a rectangular set E of entries, having a 
25 selected first order statistical coefficient nq (arithmetic mean 
or median or mode) and having a second order statistical 
coefficient m 2 no larger than a selected positive threshold 
value a(thr). In step 112, an expanded pre-cell set E', having 
one more row or one more column than E, is formed, and 
30 statistical coefficients m (E') and m 2 (E’) are computed for this 
pre-cell E\ In step 113, m 2 (E') is compared with the threshold 
value a(thr). If the coefficient m 2 (E’) for the expanded set E' 
is no larger than the threshold value a(tlir), the pre-cell is 
35 redefined, in step 114, to include the expanded set E', and the 
system returns to step 112. The redefined set E' is further 
expanded in step 1 15 by one row or one column to form a new 
set E", and the system returns to step 112. If m 2 (E') is larger 
than the threshold a(thr), the expanded set E' is rejected, the 
40 pre-cell includes the set E but not this particular expanded set 
E T , and the system returns to step 112. However, another 
expanded set can be formed from E, by adding a different row 
or column, and the coefficient m 2 for this new expanded set 
can be computed and compared with a(thr) . At some level, the 
45 system identifies a rectangular or other shape set E" of maxi- 
mum size whose coefficient m 2 (E~) is no larger than the 
threshold value a(thr), and this maximum size set becomes a 
cell. This process is repeated until every entry in a cell is 
“similar” to every other entry in that cell, as measured by the 
5 0 threshold value a(thr) . The number of matrix entries has been 
reduced to a smaller number of cells. The cells may be rect- 
angular but do not necessarily have the same size. In this 
approach, the entries in a cell are represented by the coeffi- 
55 cient nq for that cell, but the cell size is determined by the 
adjacent entries for which m 2 (E)^a(thr).so that the entries 
may be more “similar” to each other. 

One practical approach for neural network training is back- 
propagation of errors, together with conjugate gradient analy- 
60 sis to identity global minima. This approach is discussed, for 
example, by T. Masters in Practical Neural Network Recipes 
in C++, Morgan Kaufman Publ., 1993, pp. 102-111. 

With reference to step 50 in FIG. 4 in the preceding, a 
conjugate gradient algorithm with trust region (to limit the 
65 extension in any direction in coordinate space) is applied to 
the error term sum, €(q) with q fixed, to determine an extre- 
mum point (minimum) for the received cell representatives. 
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For example, the basic Fletcher-Reeves algorithm can be 
utilized, wherein a direction of steepest descent 

Po=-go (13) 

for the surface or function is first identified; a line search is 
performed to estimate the optimal distance to be moved along 
the current search direction (p fr ) 

x k+i =x k +a k p k ; (14) 

and a conjugate direction 

Pk=gk+$kPk- 1, (15) 

is determined for the new search direction. For the Fletcher- 
Reeves update, the parameter $ k is chosen according to 

$k=gk m gAgk-i-gk-i}- (16) 

For the Polak-Ribiere update, the parameter (3^ is chosen 
according to 

fik=&gk-i -gAgk-i‘gk-i}, (1 7) 

where Ag^_ 1 =g^_ 1 -g A: _ 2 is the preceding change in the direc- 
tion of steepest descent. In any conjugate gradient approach, 
it is preferable to periodically reset the search direction to the 
steepest descent gradient. In a particular approach developed 
by Powell and Beale, resetting occurs when little orthogonal- 
ity remains between the present gradient and the preceding 
gradient; the corresponding test is whether the inequality 

l^-u^l^o.21^1 2 . (18) 

is satisfied. Other variations on the corresponding algorithms 
can also be used here. 

What is claimed is: 

1. A method for training and using a system to identify a 
sub -audible signal formed by a source of sub-audible sounds, 
the method comprising providing a computer that is pro- 
grammed to execute, and does execute, the following 
actions:: 

(1) receiving R signal sequences, numbered r=l, . . . , R 
(Ri^2), with each sequence comprising an instance of a 
sub-audible speech pattern (“SASP”), uttered by a user, 
and each SASP including at least one word drawn from 
a selected database of Q words, numbered q=l, . . . , Q 
with Q=2; 

(2) estimating where each of the R SASPs begins and ends 
in the sequences; 

for each of the signal sequences, numbered r=l, . . . , R: 

(3) providing signal values of a received signal, number 
r, within a temporal window having a selected win- 
dow width At(win); and 

(4) transforming each of the R SASPs, using a Signal 
Processing Transform (“SPT”) operation to obtain an 
SPT value that is expressed in terms of at least first and 
second transform parameters comprising at least a 
signal frequency and a signal energy associated with 
the SASP; 

(5) providing a first matrix M with first matrix entries equal 
to the SPT values for the R SASPs, ordered according to 
the at least first and second transform parameters along 
a first matrix axis and along a second matrix axis, respec- 
tively, of the matrix M; 

(6) tes sedating the matrix M into a sequence of exhaustive 
and mutually exclusive cells of matrix entries, referred 
to as M-cells, with each M-cell containing a collection of 
contiguous matrix entries, where each M-cell is charac- 
terized according to at least one selected M-cell crite- 
rion; 


14 

(7) providing, for each M-cell, an M-cell representative 
value, depending upon at least one of the first matrix 
entries within the M-cell; 

(8) formatting the M-cell representative values as a vector 

5 V with vector entry values v^(q;r), numbered k= 1 , . . . , K 

(KS2); 

(9) analyzing the vector entry values v fr (q;r) using a neural 
net classifier, having a neural net architecture, and a 
sequence of estimated weight coefficient values associ- 

10 ated with at least one of the neural net classifier layers, 
where the neural net classifier provides a sequence of 
output values dependent upon the weight coefficient 
values and upon the vector entry values v^(q; r); 

(10) receiving the vector entries v^(q;r) and forming a first 
15 sum 

Sl(q;r) A =2*W 1 ^ A (q;r)-v*(q;r), 

where {w 1 ^(q;r)}- is a first selected set of adjustable weight 
coefficients that are estimated by a neural net procedure; 

20 (11) forming a first activation function A1 {Sl(q;r) / J, that 

is monotonically increasing as the value Sl(q;r) A 
increases; 

(12) forming a second sum 

S2(q;r)^=2 A w 2 ^(q;r)-Al{ Sl(q;r) A } (g =1, . . . , G; 

25 G =«’ 

where w 2 ^(q;r)- is a second selected set of adjustable weight 
coefficients that are estimated by the neural net procedure; 

(13) forming a second activation function A2 |S2(q;r)^} 
that depends upon the second sum S2(q;r), that is mono- 

30 tonically increasing as the value S2(q;r) increases; 

(14) providing a set of reference output values {A(q; rei)J 
as an approximation for the sum A2 {S2(q,r)^} for the R 
instances of the SASP; 

(15) forming a difference Al(q)=(l/RG) 2 7 ,^IA2 

35 {S2(q;r)^}-A](q; ref) g \ pl , where p 1 is a selectedpositive 

exponent; 

(16) comparing the difference A 1 (q) with a selected thresh- 
old value €(thr;l); 

(17) when Al(q)[[>]] is greater than €(thr;l), adjusting at 

40 least one of the weight coefficients ^ A (q;r) and the 

weight coefficients w 2 ^^(q;r), returning to step (10), 
and repeating the procedures of steps (10)- (16); and 

(18) when A 1 (q) is no greater than €(thr; 1 ) , interpreting thi s 
condition as indicating that at least one of an optimum 

45 first set of weight coefficients {w : ^(q;r;opt)} and an 
optimum second set of weight coefficients {w 2 ^^(q;r; 
opt)} has been obtained, and using the at least one of the 
first set and second set of optimum weight coefficients to 
receive and process a new SASP signal and to estimate 
50 whether the received new SASP signal corresponds to a 
reference word or reference phrase in the selected data- 
base. 

2. The method of claim 1, wherein said computer is further 
programmed to execute, and does execute, said step (18) by a 
55 procedure comprising the following actions: 

(19) receiving a new sub -audible speech pattern SASP 
signal uttered by said user containing an instance of at 
least one unknown word, referred to as a “new” word, 
indexed with an index q' that may be in said database of 

60 Q; 

(20) estimating where the new word begins and ends in the 
new SASP 

(21) providing signal values for the new SASP within each 
of said temporal windows, numbered j=l, . . . , J with 

65 J^2, that are shifted in time relative to each other by 

selected multiples of a selected displacement time 
At(displ); 
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(22) for the signal values within each of the time-shifted 
windows, numbered j=l, . . . , J: 

(23) transforming each of the signal values of the new 
SASP, using said Signal Processing Transform (SPT) 
operation to obtain new SASP SPT values with said at 5 
least first and second transform SPT values; 

(24) providing a second matrix M' with second matrix 
entries equal to the new SASP SPT values, ordered 
according to said at least first and second transform 
parameters along a first and second matrix axes, to 
respectively, of the second matrix M'; 

(25) tessellating the second matrix M' into a sequence of 
exhaustive and mutually exclusive M' -cells that corre- 
spond to said M-cells for said tessellated matrix M, 
where each M'-cell is characterized according to at least 15 
one selected M'-cell criterion; 

(26) providing, for each M’-cell in the second matrix M', a 
M'-cell representative value depending upon at least one 
of the second matrix entries within the M'-cell; 

(27) formatting the M'-cell representative values as a 20 
vector V' with vector entry values where v'^q'jr) 
refers to new word or phrase index (k=l, . . . , K); 

(28) applying said neural net classifier and said reference 
set of said optimum first set and said optimum second set 
of weight coefficients to compute said neural net classi- 25 
her output values for each of the time-shifted sequences 
of the new SASP; 

(29) receiving the vector entries v'^qir) and forming a first 
sum 

SI ";r;oyt)-v' k (q ';r), 

where weight coefficients w\ ^ A (q";r;opt) are said optimized 
first weight values coefficients found for a candidate word or 
phrase (q") in the database; 

(30) forming a first new word activation function A1'{S1' 35 
(q';q";r) A } that depends upon the first sum Sl'(q';q";r)^; 

(31) forming a second sum 

$2 ' (q q ”;r )g=2 A w ",Topt) -AX'iSlXq *; q ";r ) h } 

(g=l,....G; G&l), 

where weight coefficients w' 2 A ^(q";r)- are said optimized 
second weight coefficients found for a candidate word or 
phrase (q") in the database; 

(32) forming a second new word activation function 

A2'{S2'(q';q";)^} that depends upon the second sum S2' 
(q';q";r)*; 45 

(33) providing a set of reference output values {A'(q"; 
ref)J associated with each candidate word or phrase 
(q") in the database; 

(34) forming a comparison difference 

A'(q";ref) g \ p2 , 

where p2 is a selected positive exponent; 

(35) comparing the difference Al(q";q') with a selected 

threshold value e(thr;2); 55 

(36) when the difference Al(q";q')is greater than €(thr;2), 
returning to step (28) and repeating the procedures of 
steps (2 8)- (3 5) with another candidate word or phrase 
(q") in the database; and 

(37) when A 1 (q" ;q’)is no greater than €(thr;2), interpreting 60 
this condition as indicating that the present candidate 
word or phrase (q") is the “new” word (q'), and indicat- 
ing that the present candidate word or phrase q" is likely 

to be the “new” word q'. 

3 . The method of claim 2, wherein said computer is further 65 
programmed to execute, and does execute, the following 
actions: 
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replacing at least one of said matrix cell features by a 
normalized feature for each of said cells corresponding 
to said matrix M. 

4. The method of claim 2, wherein said computer is further 
programmed to execute, and does execute, the following 
actions: 

when at least two distinct words, number ql and q2, in said 
database satisfy AT(q';q"=ql)«AT(q';q"=q2), and AT 
(q';ql) and Al'(q';q2) are substantially less than AT 
(q'q") for any word q'Vql and q"*q2 in said database, 
and interpreting this condition as indicating that said 
new word included in said new SASP cannot be unam- 
biguously identified. 

5. The method of claim 2, wherein said computer is further 
programmed to execute, and does execute, the following 
actions: 

choosing said weighting for said weighted points from the 
group of weighting consisting of (i) substantially uni- 
form weighting and (ii) a weighting that decreases 
monotonically as said magnitude of said comparison 
difference increases. 

6. The method of claim 2, wherein said computer is further 
programmed to execute, and does execute, the following 
actions: 

determining said reference set of said weight coefficients to 
be independent of said word number q in said database. 

7. The method of claim 2, wherein said computer is further 
programmed to execute, and does execute, the following 
actions: 

determining said reference set of said weight coefficients 
so that at least one reference setnof said weight coeffi- 
cients so that at least one reference set weight coefficient 
for a first selected word number ql in said database 
differs from a corresponding reference set weight coef- 
ficient for a second selected word number q2 in said 
database. 

8. The method of claim 2, wherein said computer is further 
programmed to execute, and does execute, the following 
actions: 

selecting said window width At(win) in a range 1-4 sec. 

9. The method of claim 2, wherein said computer is further 
programmed to execute, and does execute, the following 
actions: 

selecting each of said matrix cells to be rectangularity 
shaped. 

10. The method of claim 9, wherein said computer is fur- 
ther programmed to execute, and does execute, the following 
actions: 

selecting at least two of said matrix cells to have different 
sizes. 

11. The method of claim 2, wherein said computer is fur- 
ther programmed to execute, and does execute, the following 
actions: 

choosing said SPT operations from the group of SPT 
operations consisting of (i) a windowed short time inter- 
val Fourier Transform (STFT); (ii) discrete wavelets 
(DWTs) and continuous wavelets (CWTs) using 
Daubechies 5 and 7 bases; (iii) dual tree wavelets 
(DTWTs) with a near sym_a 5,7 tap filter and a Q- shift 
14,14 tap filter; (iv) Hartley Transform; (v) Linear Pre- 
dictive Coding (LPC) coefficients; (vi) a moving aver- 
age of a selected number of said sample values with 
uniform weighting; and (vii) a moving average of a 
selected number of said sample values with non-uniform 
weighting. 
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12. The method of claim 2, wherein said computer is fur- 
ther programmed to execute, and does execute, the following 
actions: 

selecting said database to include at least one of the words 
“stop”, “go”, “left”, “right”, “alpha”, “omega”, “one”, 
“two”, “three”, “four”, “five”, “six”, “seven”, “eight”, 
“nine” and “ten”. 

13. The method of claim 2, wherein said computer is fur- 
ther programmed to execute, and does execrute, the following 
actions: 

selecting said error threshold number to lie in a range 
e(thr;l)^0.01. 

14. The method of claim 2, wherein said computer is fur- 
ther programmed to execute, and does execute, the following 
actions: 

applying a backpropagation of error method in said neural 
net classifier analysis of said features of said cells of said 
matrix M. 

15. A method for training and using a system to identify a 
sub -audible signal formed by a source of sub-audible sounds, 
the method comprising providing a computer that is pro- 
grammed to execute, and does execute, the following actions: 

(1) receiving R signal sequences, numbered r=l, . . . , 
R(Ri=2), with each sequence comprising an instance of 
a specified sub -audible speech pattern (“SASP”), 
uttered by the user, and each SASP including at least one 
word drawn from a selected database of Q words, num- 
bered q=l, . . . , Q (Qi^2); 

(2) estimating where each SASP begins and ends for each 
of the signal sequences; 

(3) providing signal values of the received signal, number 
r, within a temporal window having a selected window 
width At (win); 

(4) transforming each of the R SASPs, using an Signal 
Processing Transform (“SPT”) operation to obtain an 
SPT value that is expressed in terms of at least one 
transform parameter having a sequence of parameter 
values, including a signal frequency an a signal energy 
associated with the SASP; 

(5) providing a first matrix M with first matrix entries equal 
to the SPT values for the R SASPs, ordered according to 
each of the at least first and second transform parameters 
along a first matrix axis and along a second matrix axis, 
respectively of the matrix M; 

(6) tessellating the matrix M into a sequence of exhaustive 
and mutually exclusive, cells of the matrix entries, 
referred to as M-cells, with each M-cell containing a 
collection of contiguous matrix entries, where each 
M-cell is characterized according to at least one selected 
M-cell criterion; 

(7) providing, for each M-cell, an M-cell representative 
value depending upon at least one of the first matrix 
entries within the M-cell; 

(8) formatting the cell representative values as a vector V 
with vector entry values v^(q;r) numbered k=l, . . . , K 
(Kg 2); 

(9) analyzing the vector entry values v^(q;r) using a neural 
net classifier, having a neural net architecture with at 
least one neural net hidden layer, and a sequence of 
estimated weight coefficient values w^(q,r) associated 
with that at least one neural net hidden layer, where the 
neural net classifier provides a sequence of neural net 
output values A(q,r), equal to a sum over the index k of 
each of the vector entry values v fr (q,r) multiplied by a 
corresponding weight coefficient value w^(q,r); 
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(10) providing a set of neural net reference output values 
{A(q; ref)} .as an approximation for the sum A(q,r) for 
the R instances of the SASP (r=l, . . . , R); 

(11) forming a difference A(q)=2 r IA(q;r)} A(q;ref)l jP , 

5 where p is a selected positive exponent 

(12) comparing the difference A(q) with a first threshold 
value €(thr;l). 

(13) when A(q) is greater than a first positive threshold 
value €(thr;l), adjusting at least one of the weight coef- 

10 ficients w^(q;r), returning to step (9), and repeating the 
procedures of steps (9)-(12); and 

(14) when A(q) is no greater than €(thr; 1 ), interpreting this 
condition as indicating that at least one of an optimum 
set of weight coefficients {w^(q;r;opt)} has been 

15 obtained, and using the set of optimum weight coeffi- 
cients to receive and process a new SASP signal and to 
estimate whether the received new SASP signal corre- 
sponds to a reference word or reference phrase in the 
selected database. 

20 16 . The method of claim 15 wherein said computer is 

further programmed to execute, and does execute, the follow- 
ing actions: 

choosing said SPT operations from a group of SPT opera- 
tions consisting of (i) a windowed short time interval 

25 Fourier Transform (STFT); (ii) discrete wavelets 
(DWTs) and continuous wavelets (CWTs) using 
Daubechies 5 and 7 bases; (iii) dual tree wavelets 
(DTWTs) with a near sym_a 5,7 tap filter and a Q- shift 
14,14 tap filter; (iv) Hartley Transform; (v) Linear Pre- 

30 dictive Coding (LPC) coefficients; 

(vi) a moving average of a selected number of said sample 
values with uniform weighting; and (vii) a moving aver- 
age of a selected number of said sample values with 
non-uniform weighting. 

35 17 . The method of claim 15 , wherein said computer is 

further programmed to execute, and does execute, the follow- 
ing actions: 

selecting at least first and second of said matrix cells to 
have a cell dimension, measured along a corresponding 

40 matrix axis of said matrix M, that is different for the first 

cell and for the second cell. 

18 . The method of claim 15 , wherein said computer is 
further programmed to execute, and does execute, the follow- 
ing actions: 

45 (15) receiving a new sub-audible speech pattern SASP1 

uttered by said user, comprising an instance of at least 
one unknown word, referred to as a “new” word, iden- 
tified with an index ql , that may be but is not necessarily 
drawn from said database of Q words; 

50 (16) estimating where the new word begins and ends in the 

new SASP1; 

(17) providing signal values of the received SASP1 within 
each of said temporal windows; 

(18) transforming each of the signal values of the new 

55 SASP1, using said Signal Processing Transform (SPT) 

operation to obtain new SASP1 SPT values, where each 
SASP1 SPT value is expressed in terms of said at least 
first and second transform parameters, including a signal 
frequency and a signal energy associated with the 

60 SASP1; 

(19) providing a second matrix Ml with second matrix 
entries equal to SPT values for the SASP1, ordered 
according to each of said at least first and second trans- 
form parameters along first and second matrix axes of 

65 the second matrix Ml ; 

(20) tessellating the matrix Ml into a sequence of exhaus- 
tive and mutually exclusive Ml -cells that correspond to 
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said sequence of said M-cells for said matrix M where 
each Ml -cell is characterized according to said or more 
cell criteria for said M-cells; 

(21) providing, for each M 1 -cell, an M 1 -cell representative 
value depending upon at least one of the second matrix 5 
entry values within the Ml -cell; 

(22) formatting the Ml -cell representative values as a vec- 
tor VI with vector entries vl^(ql), numbered k=l, . . . , 

K (Ki^2), where ql refers to said index associated with 
said new word; 10 

(23) analyzing the vector entry values vl^(ql) using said 
neural net classifier, having said neural net architecture 
with said at least one neural net hidden layer, and a 
sequence w^qfiopt) of said optimum weight coeffi- 
cients w^(ql,rl;opt), associated with said at least one 15 
neural net hidden layer, and averaged over said R 
instances (rl=l, . . . , R) of said SASP uttered by saiduser 

in claim 15 , 

(24) providing a neural net output value Al(ql), equal to a 
sum over the index k of each of the vector entry values 2 q 
vl^(ql) multiplied by the corresponding averaged opti- 
mum weight coefficient value w^qfiopt); 
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(25) providing a set of neural net reference output values 
{ A1 (q'; ref)} as an approximation for the sum A1 (ql ) for 
the R1 instances of the SASP1, where q' is one of said 
indices corresponding to said database of Q words; 

(26) forming a comparison difference Al(ql,q')= 
I A1 (ql)} — A(q’; ref) \ p , where said quantities A1 (q';ref) 
and p are determined as in claim 15 ; 

(27) comparing the difference Al(ql,q') with said first 
threshold value €(thr;l). 

(28) when Al(ql,q’) is greater than said first threshold 
value €(thr;l), interpreting this condition as indicating 
that said sub-audible speech pattern SASP1 received is 
not a sub-audible speech pattern from said database with 
the corresponding number ql=q'; and 

(29) when Al(ql ,q') is no greater than €(thr; 1), interpreting 
this condition as indicating that said sub-audible speech 
pattern SASP1 received is likely to be a sub-audible 
speech pattern from said database, indexed by q', with 
the corresponding index ql . 



